Support Vector Model with Social Media Data

SVM explanation

An SVM classifier creates a line (plane or hyper-plane, depending upon the dimensionality of the data) in an N-dimensional space to classify data points that belong to two separate classes. It is also noteworthy that the original SVM classifier had this objective and was originally designed to solve binary classification problems, however unlike, say, linear regression that uses the concept of line of best fit, which is the predictive line that gives the minimum Sum of Squared Error (if using OLS Regression), or Logistic Regression that uses Maximum Likelihood Estimation to find the best fitting sigmoid curve, Support Vector Machines uses the concept of Margins to come up with predictions.

  • SVM algorithm predicts the classes. One of the classes is identified as 1 while the other is identified as -1.

  • As all machine learning algorithms convert the business problem into a mathematical equation involving unknowns. These unknowns are then found by converting the problem into an optimization problem. As optimization problems always aim at maximizing or minimizing something while looking and tweaking for the unknowns, in the case of the SVM classifier, a loss function known as the hinge loss function is used and tweaked to find the maximum margin. Hinge Loss Function

  • For ease of understanding, this loss function can also be called a cost function whose cost is 0 when no class is incorrectly predicted. However, if this is not the case, then error/loss is calculated. The problem with the current scenario is that there is a trade-off between maximizing margin and the loss generated if the margin is maximized to a very large extent. To bring these concepts in theory, a regularization parameter is added. Loss function for SVM

  • As is the case with most optimization problems, weights are optimized by calculating the gradients using advanced mathematical concepts of calculus viz. partial derivatives. Gradients

  • The gradients are updated only by using the regularization parameter when there is no error in the classification while the loss function is also used when misclassification happens.

Experiment

We have a dataset with personal data from a social media company. This data's features include age, salary, and a factor variable stating whether the customer purchase the item they were advertised. I am using the scikit package to perform this analysis.

Step 1- Import packages and clean data

  • I am going to use pandas and numpy to clean our data
  • matplotlib will be used to visualize
In [11]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
In [12]:
dataset = pd.read_csv('mediaads.csv')
dataset
Out[12]:
User ID Gender Age EstimatedSalary Purchased
0 15624510 Male 19 19000 0
1 15810944 Male 35 20000 0
2 15668575 Female 26 43000 0
3 15603246 Female 27 57000 0
4 15804002 Male 19 76000 0
... ... ... ... ... ...
395 15691863 Female 46 41000 1
396 15706071 Male 51 23000 1
397 15654296 Female 50 20000 1
398 15755018 Male 36 33000 0
399 15594041 Female 49 36000 1

400 rows × 5 columns

Out dataset has 4 features and one target variable. Age and Estimated Salary are the only numerical features that will associate with a purchase, so I am going to save these to X as predictors. I saved the purchase decision as y since it's the target.

In [13]:
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values
In [ ]:
 

Step 2 Split Data

I will use a .25/.75 ratio in test/train data. sklearn will do most of this for me with the train_test_split function. StandardScaler() will standardize the data automatically as well.

In [14]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
In [15]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Step 3- Create model

Using SVC function from sklearn we can choose the Radial Bias Function as one of our parameters.

The fit method is a fundamental part of the Scikit-Learn library. It’s used to train a machine learning model on a dataset. Specifically, the fit method takes in a dataset (typically represented as a 2D array or matrix) and a set of labels, and then fits the model to the data.

In [16]:
from sklearn.svm import SVC
classifier = SVC(kernel = 'rbf', random_state = 0)
classifier.fit(X_train, y_train)
Out[16]:
SVC(random_state=0)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
SVC(random_state=0)

Step 4- Predict

Using the predict() method, I can predict the label of a new set of data. This method accepts one argument, the new data X_new (e.g. model. predict(X_new) ), and returns the learned label for each object in the array. We can analyze our models perforance with the confusion matrix function.

In [17]:
y_pred = classifier.predict(X_test)
In [18]:
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test,y_pred)
[[64  4]
 [ 3 29]]
Out[18]:
0.93

An accuracy of 93% is pretty good. There are 64 true positives with 4 false. There are 29 true negatives with 3 false. This classification method can be very helpful for suggesting the product to new customers with this high accuracy.

Step 5 - Visualize

Using Matplotlib we can see the false positives and negatives. These can be further analyzed in the future. I can already see that redrawing the line could get one or 2 of the false negatives to positives.

In [19]:
from matplotlib.colors import ListedColormap
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('SVM (Test set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()
In [ ]:
 
In [ ]: